Creating a CCGbank and a Wide-Coverage CCG Lexicon for German

نویسنده

  • Julia Hockenmaier
چکیده

We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94% of all known tokens in unseen text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending CCGbank with Quotes and Multi-modal CCG

CCGbank is an automatic conversion of the Penn Treebank to Combinatory Categorial Grammar (CCG). We present two extensions to CCGbank which involve manipulating its derivation and category structure. We discuss approaches for the automatic re-insertion of removed quote symbols and evaluate their impact on the performance of the C&C CCG parser. We also analyse CCGbank to extract a multi-modal CC...

متن کامل

Hindi CCGbank: CCG Treebank from the Hindi Dependency Treebank

In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...

متن کامل

Punctuation Normalisation for Cleaner Treebanks and Parsers

Although punctuation is pervasive in written text, their treatment in parsers and corpora is often second-class. We examine the treatment of commas in CCGbank, a wide-coverage corpus for Combinatory Categorial Grammar (CCG), reanalysing its comma structures in order to eliminate a class of redundant rules, obtaining a more consistent treebank. We then eliminate these rules from C&C, a wide-cove...

متن کامل

Semi-supervised lexical acquisition for wide-coverage parsing

State-of-the-art parsers suffer from incomplete lexicons, as evidenced by the fact that they all contain built-in methods for dealing with out-of-lexicon items at parse time. Since new labelled data is expensive to produce and no amount of it will conquer the long tail, we attempt to address this problem by leveraging the enormous amount of raw text available for free, and expanding the lexicon...

متن کامل

Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

Automated conversion has allowed the development of wide-coverage corpora for a variety of grammar formalisms without the expense of manual annotation. Analysing new languages also tests formalisms, exposing their strengths and weaknesses. We present Chinese CCGbank, a 760,000 word corpus annotated with Combinatory Categorial Grammar (CCG) derivations, induced automatically from the Penn Chines...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006